feat: split billing for cached vs non-cached prompt tokens#4
Conversation
CalculateCost now accepts CacheUsage and bills cached tokens at the CacheRead rate instead of the full Input rate. This correctly reflects the cost savings from prompt caching (typically 50-90% cheaper). Changes: - BillingResult: added CachedTokens and CachedInputCost fields - CalculateCost: accepts *CacheUsage, splits cached/non-cached pricing - BillingInterceptor: extracts cache_usage from ResponseMetadata.Custom and passes it to CalculateCost
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughCalculateCost now accepts optional CacheUsage and tracks cached prompt tokens and cached input cost in BillingResult; cached tokens are clamped to prompt tokens, split from non-cached tokens, and billed using cache-read pricing when available. BillingInterceptor extracts cache usage from response metadata and forwards it. New unit tests verify behaviors. Changes
🚥 Pre-merge checks | ✅ 1✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
billing.go (1)
50-61: Cached token normalization assumes mutual exclusivity.The addition
cacheUsage.CachedTokens + cacheUsage.CacheReadInputTokensrelies on each provider populating only one of these fields. While the clamping logic (lines 59-61) provides a safety net, if a provider or future API change populates both fields, cached tokens would be initially double-counted before clamping silently corrects it.This is likely fine given current provider behaviors, but consider adding a brief comment noting this assumption for future maintainers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@billing.go` around lines 50 - 61, The code currently sums cacheUsage.CachedTokens and cacheUsage.CacheReadInputTokens into cachedTokens which assumes those fields are mutually exclusive; add a brief explanatory comment above that line (near where cachedTokens is computed) stating the assumption that providers populate only one of CachedTokens or CacheReadInputTokens and that we clamp against promptTokens as a safety net—this helps future maintainers understand why the sum is used and that clamping prevents overcounting if both fields are ever set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@billing.go`:
- Around line 50-61: The code currently sums cacheUsage.CachedTokens and
cacheUsage.CacheReadInputTokens into cachedTokens which assumes those fields are
mutually exclusive; add a brief explanatory comment above that line (near where
cachedTokens is computed) stating the assumption that providers populate only
one of CachedTokens or CacheReadInputTokens and that we clamp against
promptTokens as a safety net—this helps future maintainers understand why the
sum is used and that clamping prevents overcounting if both fields are ever set.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f3c23aad-4492-4551-a83b-edb402ac9848
📒 Files selected for processing (2)
billing.gointerceptors/billing.go
📜 Review details
🔇 Additional comments (5)
billing.go (4)
32-44: LGTM!The
BillingResultstruct extensions are well-documented. The newCachedTokensandCachedInputCostfields clearly separate cached token accounting from non-cached, and the updated comments accurately reflect the new semantics.
68-76: LGTM!The fallback logic for cache pricing is well-implemented. Using
cacheRate <= 0correctly handles both missing (zero-value) and invalid (negative) cache pricing by falling back to the full input rate, which is a sensible default that prevents undercharging.
78-91: LGTM!The total cost calculation correctly aggregates non-cached input, cached input, and output costs. The
BillingResultis properly populated with all fields.
49-49: No action needed. TheCacheUsagetype is properly defined inmetadata.gowith all expected fields (CachedTokens,CacheReadInputTokens, etc.) and is accessible within the package. The function signature is correct.interceptors/billing.go (1)
41-45: Type assertion is correct—no issue found.All providers (
openai_compatible,bedrock,anthropic,promptcaching) consistently storecache_usageinrespMeta.Customas a value typellmproxy.CacheUsage, never as a pointer. The code inbilling.gocorrectly asserts against the value typecu.(llmproxy.CacheUsage), and no silent failure occurs.> Likely an incorrect or invalid review comment.
Tests cover: - No cache usage (nil) — all tokens at full rate - OpenAI cache hit (CachedTokens field) - Anthropic cache hit (CacheReadInputTokens field) - Cache usage present but zero tokens — treated as no caching - Cached tokens exceeding prompt tokens — clamped - No CacheRead price — falls back to full input rate - All tokens cached — zero non-cached cost - Zero tokens — zero cost - Mixed provider cache fields (summed) - Interceptor extracting cache_usage from ResponseMetadata.Custom - Interceptor with nil/empty Custom map
Summary
CalculateCostnow correctly bills cached prompt tokens at theCacheReadrate instead of the fullInputrate, reflecting the actual cost savings from prompt caching.Changes
billing.goCalculateCostaccepts new*CacheUsageparameterCachedTokens, AnthropicCacheReadInputTokens, Bedrock) billed atCacheReadrateInputrateInputrate if noCacheReadpricing availableBillingResultnow includesCachedTokensandCachedInputCostfieldsinterceptors/billing.gocache_usagefromResponseMetadata.Custom(already set by OpenAI, Anthropic, Bedrock, Fireworks extractors)*CacheUsagetoCalculateCostPricing Example (gpt-4o, 2000 prompt tokens, 1920 cached)
Breaking Change
CalculateCostsignature changed — callers must pass*CacheUsage(ornilfor no caching):Summary by CodeRabbit
New Features
Tests